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Introduction 


Breast  conserving  therapy  (BCT),  which  includes  local  excision  and  radiation  treatment  to  the 
breast,  has  been  the  standard  of  care  for  early  breast  cancers  (Stage  0-11),  since  five  major 
prospective,  randomized  trials  demonstrated  that  the  long-term  survival  after  BCT  is  equivalent  to 
that  of  radical  mastectomy  for  most  patients  when  surgical  margins  are  clear  of  residual  disease1 
5.  Of  the  250,000  women  annually  diagnosed  with  breast  cancer  eligible  for  BCT,  approximately 
165,000-180,000  have  conservative  surgery6.  Following  breast  conservation,  the  strongest  risk 
factor  for  local  recurrence  and  mortality  is  a  positive  resection  margin  (tumor  cells  on  ink). 
Therefore,  if  a  margin  is  positive  or  close,  the  patient  is  advised  to  undergo  re-excision  surgery  to 
achieve  clear  margins7,8.  Margin  status  is  currently  evaluated  post-operatively  by  microscopic 
evaluation  of  pathology  in  small,  representative  pieces  of  tissue.  While  sampled  tissues  are 
adequate  for  assessing  tumor  type,  grade  and  receptor  status,  they  are  insufficient  for  evaluating 
important  prognostic  factors  like  disease  extent  and  multi-focality.  Additionally,  breast  tissue  is 
markedly  heterogeneous,  which  makes  distinguishing  small  foci  of  cancer  within  the  spectrum  of 
normal  tissue  challenging  when  using  point-based  probe  measurements9"11.  A  broadband 
spectroscopy  platform  was  developed  to  image  thick  tissue  samples  at  a  resolution  sensitive  to 
the  diagnostic  gold  standard,  pathology12.  Tissue  samples  were  imaged  in  a  1cm2field  of  view 
across  a  static  beam  using  a  motorized  stage,  permitting  wide-field  optical  characterization  of 
diagnostic  pathology.  The  sampling  spot  size  (100pm  lateral  resolution)  confined  the  volume  of 
tissue  probed  to  within  a  few  transport  pathlengths  so  that  multiple-scattering  effects  were 
minimized  and  simple  empirical  models  parameterized  the  spectra.  A  k-Nearest  Neighbor  (k-NN) 
classifier  was  trained  using  parameters  extracted  from  the  localized  scattering  spectrum, 
automating  diagnosis  of  benign  and  malignant  breast  pathologies  in  situ  with  a  sensitivity  and 
specificity  of  91%  and  77%  respectively.  Performance  of  the  classifier  was  validated  in  67,000 
spectra  from  29  excised  breast  tissues13.  The  work  achieved  in  year  one  of  this  Department  of 
Defense  Pre-doctoral  Traineeship  Award  effectively  characterized  the  spectral  response  of  breast 
pathologies  and  automated  classification  of  the  tissue’s  spectral  response  according  to  a 
diagnosis.  Clinically  feasible  data  acquisition  speeds  were  attained  through  development  of  a 
dark-field  in  situ  scanning-beam  spectroscopy  platform.  Year  two  of  the  traineeship  fellowship  will 
assess  the  ability  of  the  spectral  imaging  platform  to  provide  immediate  evaluation  of  involved 
surgical  margins  for  the  presence  of  residual  cancer  during  breast-conserving  surgery. 
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Task  1 .  Assemble  and  parameterize  an  extensive  databank  of  scatter  spectra  from  fresh  breast 
tissue  across  clinically  relevant  diagnostic  categories. 

Materials  and  Methods 

la.  Scatter  spectroscopy  of  fresh  breast  tissue 

Fresh  breast  tissue  specimens  were  imaged  in  a  custom-built  micro-sampling  reflectance  spectral 
imaging  system  14.  This  system  employs  a  quasi-confocal  illumination  and  detection  (510-785nm) 
to  constrain  the  overlapping  illumination  and  detection  spot  sizes  to  within  approximately  one 
scattering  distance  in  tissue  (-100  pm  in  the  visible).  A  complete  description  of  this  imaging 
system  can  be  found  in  a  previous  study  15.  Sampling  in  this  mesoscopic  regime  allows  the  use  of 
simple  empirical  models  to  describe  the  light  transport.  For  the  short  pathlengths  involved  and  for 
typical  values  of  absorption  and  scattering  in  tissue,  the  measured  spectral  response  is 
proportional  to  the  reduced  scattering  coefficient,  ps’.  In  regions  where  significant  local  absorption 
is  encountered,  a  Beer’s  law  type  attenuation  factor  is  used  to  correct  for  the  effects  of  absorption 

15-18 

lb.  Empirical  model  of  spectral  scattering  response 

An  empirical  approximation  to  Mie  theory  was  used  to  describe  the  measured  reflectance 
spectrum,  R( A),  from  a  volume-averaged  region  of  tissue19.  Additionally,  a  Beer’s  law 
attenuation  factor  corrected  for  the  presence  of  significant  local  absorption  by  Hemoglobin  (Eq.  2) 

R{  A)  =  AKh  exp-r[WW1{ S°2 *e Hb°2 a  1+0 ' S°2 fe m {X))  Eq.  2 

Parameters  A  and  b  are  scattering  amplitude  and  scattering  power  respectively.  Both  depend 
on  the  size  and  number  density  of  scattering  centers  in  the  volume  of  interrogated  tissue,  thereby 
reflecting  variations  in  breast  tissue  morphology  20  22.  T  refers  to  the  mean  optical  pathlength, 
[HbT] is  the  total  hemoglobin  concentration,  S02 is  the  oxygen  saturation  factor  (ratio  of 
oxygenated  to  total  hemoglobin),  and  eHb0i  and  eHb  refer  to  the  molar  extinction  coefficients  of 

these  two  chromophores  respectively23.  Oxygenated  and  deoxygenated  hemoglobin  were  the 
dominant  tissue  chromophores  encountered  in  the  measured  waveband.  Measured  reflectance 
spectra  were  fit  to  this  model  using  a  nonlinear  least  squares  solver  to  obtain  estimates  of 
scattering  amplitude  and  scattering  power  relative  to  Spectralon.  A  measure  of  average 
scattering  irradiance,  I  was  calculated  by  integrating  the  reflectance  spectrum  over  a 

waveband  that  avoids  the  hemoglobin  absorption  peaks  (620:785nm).  Scattering  parameters 
were  then  microscopically  correlated  to  morphological  features  identified  by  pathologist  (Wendy 
Wells)  on  Hematoxylin  and  Eosin  (H&E)  stained  sections  of  the  tissue,  cut  in  the  exact  same 
geometry  as  imaged  in  situ. 

lc.  Associate  scattering  parameters  with  diagnostic  pathology 

All  studies  were  completed  based  upon  a  protocol  approved  by  the  Committee  for  the  Protection 
of  Human  Subjects,  Institutional  Review  Board  (IRB)  at  Dartmouth.  Fresh  breast  tissue  was 
obtained  directly  from  the  Department  of  Pathology  at  Dartmouth-Hitchcock  Medical  Center  from 
patients  who  had  given  informed  consent  to  allow  this  use  of  their  tissue.  Samples  were  procured 
during  conservative  surgery  or  breast  reduction  surgery,  and  were  only  provided  if  there  was 
tissue  in  excess  of  that  required  to  make  a  clinical  diagnosis.  Tissue  samples  were  1-2  cm2  with  a 
thickness  of  ~3-5mm.  Samples  were  imaged  within  12  hours  of  surgery,  and  in  the  case  of  delay, 
the  tissue  was  stored  in  a  4°C  refrigerator  and  hydrated  with  a  phosphate  buffer  solution. 
Immediately  following  imaging,  each  sample  was  placed  in  10%  formalin  and  processed  for 
histology  (paraffin  embedded,  sectioned  to  4pm,  and  stained  with  H&E).  A  total  of  35  tissue 
specimens  were  imaged;  6  were  rejected  from  analysis  due  to  low  signal-to-noise  and/or  poor 
histological  processing  (both  a  consequence  of  highly  fatty  tissue).  In  the  remaining  29  tissue 
samples,  48  regions  of  interest  were  identified  by  the  pathologist  and  these  are  summarized  in 
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Table  1.  The  pathologist  identified  seven  tissue  pathologies  in  the  samples  and  these  were 
classified  more  generally  as  not-malignant,  malignant,  or  adipose. 


Tissue  Type  and  Subtype 

#  ROI 

#  Spectra 

Total  Not-Malignant 

25 

36979 

Normal  Epithelium  and  Stroma 

21 

31226 

Benign  Epithelium  and  Stroma 

3 

5220 

inflammation 

1 

533 

Total  Malignant 

14 

23220 

Ductal  Carcinoma  In  Situ  (DCIS) 

1 

194 

Invasive  Ductal  Carcinoma  (IDC) 

12 

22547 

Invasive  Lobular  Carcinoma  (ILC) 

1 

479 

Total  Adipose 

9 

7021 

Adipose 

9 

7021 

Total  ROI 

48 

67220 

Table  1  Distribution  of  the  sample  population  according  to  tissue  type  and  subtype. 

Figure  1(a)  illustrates  co-registration  between  the  white  light  image,  histology  and  images  of 
scattering  parameters  for  a  tissue  sample.  Figure  1  (b-c)  shows  box  plots  of  the  scattering  power 
and  the  logarithm  of  the  wavelength-integrated  irradiance  with  outliers  removed  (those  greater 
than  2  standard  deviations  from  the  mean)  for  all  tissue  samples.  The  scattering  amplitude  is  not 
displayed  because  it  follows  the  same  trend  as  scattering  power  per  diagnostic  category  and  a 
correlation  is  observed  between  scattering  power  and  logarithm  of  the  scattering  amplitude 
(Figure  Id). 
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Figure  1  (a)  Co-registration  between  the  digital  photograph,  histology  and  maps  of  scattering  power, 
amplitude,  and  total-wavelength  integrated  intensity  for  a  given  tissue;  (b-c)  box  plots  of  relative 
scatter  power  and  log  of  the  total-wavelength  integrated  intensity  according  to  diagnosis  (outliers  > 
2  std  not  displayed);  (d)  the  3-dimensional  features  space  assembled  with  scattering  parameters  and 
employed  by  the  k-NN  classifier. 

Histopathology  reveals  that  the  three  macroscopic  scattering  centers  found  in  breast  tissue  are 
epithelium,  stroma  and  adipose.  Immunohistochemistry  shows  that  the  percent  distribution  of 
these  components  varies  with  diagnosis  and  registration  of  scattering  maps  with  pathology 
illustrate  how  spectral  response  changes  as  a  function  of  diagnosis.  This  suggests  the  percent 
distribution  of  stroma,  epithelium  and  adipocytes  in  the  effective  illumination  volume  influences 
scattering  response.  Standard  immunohistochemistry  techniques  were  used  to  assess  the 
percent  distribution  of  adipose,  stroma  and  epithelium  per  sample.  Formalin  fixed  and  paraffin 
embedded  tissue  sections  were  cut  and  mounted  on  OptiPlus™  Positive  Charged  Barrier  slides 
(BioGenex,  San  Ramon  CA)  to  test  for  Anti-Cytokeratins  8  and  18  (clone  5D3;  BioGenex,  San 
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Ramon  CA.  Whole  immunostained  slides  were  digitally  scanned  and  montaged  using  the 
Surveyor©  Automated  Specimen  Scanning  (Objective  Imaging  Ltd.,  Cambridge  UK)  automated 
stage  control  bundled  software.  The  epithelium  to  stroma  ratio  was  quantified  using  Image-Pro 
Plus  (Media  Cybernetics,  Bethesda  MD)  image  analysis  software.  The  epithelial  and  stromal 
percentages  were  defined  as  the  percent  of  CK5D3  positive  or  Hematoxylin  counterstained 
tissue,  thresholded  in  pseudo-color  in  the  diagnostic  regions  of  interest  (ROI),  as  compared  to  the 
total  area  of  the  tissue,  respectively.  Figure  2(a)  shows  fitted  spectra  sampled  from  normal, 
benign  and  malignant  tissues  respectively.  Figure  2(b)  illustrates  how  epithelium,  stroma  and  fat 
content  vary  between  normal,  benign  and  malignant  samples  based  on  this  analysis. 


0.05 


•  Normal  spectra 

—  Normal  fit 
Benign  spectra 
Benign  fit 

■  Malignant  spectra 

—  Malignant  fit 


500 


550 


600  650 

Wavelength  (nm) 


700 


750 


800 


Normal  Benign  Malignant 

Figure  2  In  (a)  the  fitted  spectra  sampled  from  normal,  benign  and  malignant  tissues  respectively, 
are  shown.  In  (b)  the  distribution  of  stroma,  epithelium  and  adipose  are  shown  across  the  three 
diagnostic  categories  classified  by  immunohistochemistry,  for  all  tissue  samples  are  shown. 
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Task  2.  Develop  an  automated  classification  algorithm  to  provide  real-time,  un-biased 
interpretation  of  scatter  images  for  improved  evaluation  of  breast  surgical  margins. 

Materials  and  Methods 

2a.  k-Nearest  Neighbor  Classification 

The  distribution  of  scattering  parameters  demonstrated  subtle  discrimination  between  tissue 
subtypes,  but  data  was  multi-parametric  and  overall  classification  was  challenging.  A  k-Nearest 
Neighbor  (k-NN)  classifier  was  employed  for  ready  discrimination  between  tissue  pathologies. 
The  k-NN  classifier  simultaneously  interprets  multiple  scattering  parameters  for  tissue 
characterization  by  assigning  an  unclassified  vector  (herein  referred  to  as  the  query  point)  to  the 
majority  diagnosis  of  its  k-nearest  vectors  found  in  the  feature  space.  The  feature  space  for  three 
scatter-related  parameters  (scattering  amplitude,  scattering  power,  and  total  wavelength- 
integrated  intensity)  is  depicted  in  Figure  1(d),  as  well  as  a  query  point  with  unknown  diagnosis. 
All  tissue  pixels  were  defined  according  to  a  vector  in  the  3-dimensional  feature  space  and  were 
assigned  to  the  training  set  (populated  feature  space)  or  to  the  validation  set  (query  points). 
Sample  distributions  between  training  and  validation  sets  were  made  both  randomly  and 
according  to  a  leave-one-out  analysis  per  patient  24.  All  training  pixels  were  associated  with  a 
known  diagnosis  according  to  the  pathologist’s  demarcation  of  ROIs.  The  diagnosis  of  each  query 
point  was  also  determined  by  the  pathologist,  but  remained  unknown  to  the  classifier  in  order  to 
evaluate  its  performance. 

Additional  feature  extraction  from  the  actual  data  set  has  been  shown  to  compensate  for  pixel-to- 
pixel  variations  and  to  improve  the  overall  performance  of  the  classifier 25,  Therefore  the  first  four 
statistical  moments  (mean,  standard  deviation,  skewness  and  kurtosis)  of  each  scattering 
parameter  were  estimated  in  a  real  2-dimensional  spatial  window  centered  about  each  pixel  of 
interest.  These  local  statistical  parameters  were  concatenated  to  the  actual  scatter  parameters, 
and  parametric  feature  space  was  expanded  from  3-dimensions  to  15-dimensions.  The  behavior 
of  the  classifier  was  then  studied  as  a  function  of  two  independent  variables:  the  number  of 
nearest  neighbors  kand  the  size  of  the  spatial  window  used  to  compute  local  statistics. 

2b.  Validation  of  the  Classifier 

In  order  to  optimize  the  independent  variables  associated  with  the  classifier,  a  threefold  cross- 
validation  technique  was  applied  for  discrimination  between  not-malignant,  malignant  and 
adipose  samples  and  for  discrimination  between  all  pathology  subtypes  identified  in  Table  I27,28. 
All  data  was  randomly  divided  into  three  non-overlapping  sets,  with  an  equal  number  of  pixels  per 
diagnostic  category  per  set.  Two  of  these  sets  were  employed  as  a  training  set  (used  to  populate 
feature  space)  and  the  other  was  employed  as  a  validation  set  (query  points)  to  compute  the 
classification  error.  Error  was  taken  to  be  the  percentage  of  misclassified  pixels  in  the  validation 
set,  where  a  misclassification  means  that  the  diagnosis  assigned  to  a  pixel  by  the  automated 
classifier  does  not  match  the  diagnosis  provided  by  the  pathologist.  This  procedure  was  repeated 
three  times,  for  all  possible  permutations  of  training  and  testing  sets  and  the  reported 
classification  error  is  the  average  of  these  three  executions.  This  threefold  cross-validation  was 
repeated  for  a  varying  number  of  nearest  neighbors,  k,  and  a  varying  spatial  window  size  for 
computation  of  local  statistics. 

Additionally,  leave-one-out  analysis  was  performed  per  patient,  where  ROIs  from  one  tissue 
sample  populate  the  validation  set  and  all  other  ROI  pixels  populate  the  feature  space.  In  this 
validation  procedure,  points  are  not  equally  distributed  between  diagnostic  categories  in  either  the 
training  or  testing  sets.  Images  of  the  classification  results  were  generated  in  H&E  false  color  for 
each  tissue  sample,  allowing  one  to  evaluate  whether  the  predicted  diagnosis  outside  selected 
ROIs  makes  sense  in  context  of  the  entire  sample.  A  mode  filter  was  applied  in  a  sliding  window 
(5x5  pixels)  over  the  k-NN  classified  image  to  eliminate  impulsive  assignment  noise.  The  error 
and  efficacy  of  the  classifier  was  summarized  for  ail  tissue  samples.  Pixels  corresponding  to 
locations  where  reflectance  spectra  could  not  be  reliably  measured  were  tagged  as  masked 
pixels,  and  these  were  excluded  from  the  training  and  validation  sets  during  all  cross-validation 
procedures. 
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Results 


Table  2  summarizes  the  classification  efficacy  and  classification  error  observed  when  performing 
leave-one-out  validation  for  all  tissue  samples. 


Classification 
( Not-malignant , 
Malignant,  Fat) 


Classification 
( All  Pathologies) 


Classification  Error 


Median 

Mean 

Standard  Deviation 
Inter-quartile  range 
[min  max] 

8.75 

13.0 

13.7 

15.5 
[0  53.5] 

16.8 

25.3 

25.0 

25.5 

[2.15  95.3] 

Total  Efficacy 

Not- 

malignant 

Malignant 

Fat 

Normal 

Benign 

DCIS 

IDC 

ILC 

Inflam 

Fat 

Accuracy 

0.86 

0.86 

0.98 

0.74 

0.85 

1.00 

0.86 

0.99 

0.99 

0.98 

Sensitivity 

0.90 

0.77 

0.87 

0.74 

0.09 

0.00 

0.77 

0.00 

0.00 

0.87 

Specificity 

0.82 

0.90 

0.99 

0.74 

0.91 

1.00 

0.90 

1.00 

1.00 

0.99 

Negative  Predictive  Value 

0.87 

0.88 

0.99 

0.77 

0.92 

1.00 

0.89 

0.99 

0.99 

0.99 

Positive  Predictive  Value 

0.86 

0.81 

0.95 

0.71 

0.08 

0.00 

0.79 

0.00 

0.00 

0.95 

Table  2  Summary  of  the  classification  error  and  total  efficacy  of  the  k-NN  classifier  when 
discriminating  between  not-malignant,  malignant  and  adipose  tissue  and  when  discriminating 
between  all  pathologies.  Reported  measures  based  upon  ability  to  discriminate  given  pathology 
from  all  other  diagnostic  categories  evaluated. 


The  median  classification  error  is  approximately  17%  and  9%  when  discriminating  between  all 
pathologies  and  not-malignant,  malignant,  and  adipose  tissue  respectively.  This  is  quite  close  to 
our  performance  estimates.  When  classifying  all  pathologies,  low  sensitivity  is  observed  for  those 
classes  under-represented  in  sample  space  (benign,  DCIS,  IDC,  ILC,  inflammation).  In  any  case, 
the  classifier  has  clinical  application  because  normal  epithelium  and  stroma,  invasive  ductal 
carcinoma  and  adipose  are  the  most  frequently  encountered  tissues  during  conservative  breast 
surgery.  The  classifier’s  sensitivity  to  not-malignant,  malignant  and  adipose  pathologies  is  0.90, 
0.77  and  0.87  respectively.  Sensitivity  is  lower  in  malignant  samples  because  its  sample 
population  is  characterized  by  greater  heterogeneity.  While  DCIS,  IDC  and  ILC  are  all  considered 
malignant,  morphologically  and  biologically  they  are  quite  distinct.  Specificity  of  the  classifier  for 
not-malignant,  malignant  and  adipose  pathologies  is  0.82,  0.90  and  0.99  respectively.  Specificity 
is  lowest  in  normal  tissues  because  these  are  characterized  by  mixed  fibro-glandular  and  adipose 
content.  Epithelial  proliferation  in  malignant  tissues  was  observed  to  crowd  out  adipocytes  in  this 
study.  In  a  reflectance  geometry,  scattering  from  adipocytes  results  in  a  very  low  (noisy)  signal. 
The  negative  and  positive  predictive  values  for  each  diagnostic  category  are  also  reported.  These 
refer  to  the  number  of  patients  with  negative  and  positive  results  (respectively)  who  are  correctly 
diagnosed.  For  surgical  margin  applications,  the  surgeon  is  most  interested  in  a  high  negative 
predictive  value,  ensuring  his/her  diagnosis  of  normal  or  malignant  is  an  accurate  one.  The 
negative  predictive  values  for  not-malignant  and  malignant  pathologies  are  87%  and  88% 
respectively.  Although  less  essential,  high  positive  predictive  values  prevent  any  unnecessary  re¬ 
excisions  during  surgery. 

The  confusion  matrices  in  Table  3  illustrate  the  distribution  of  misclassified  pixels  across 
diagnostic  categories  when  performing  leave-one-out  analysis  for  two  levels  of  diagnostic 
discrimination.  This  is  important  to  consider  because  cost  to  the  patient  for  misclassifying  a 
normal  pixel  as  benign  is  less  than  cost  to  the  patient  for  misclassifying  a  malignant  pixel  as 
normal. 
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LS 

TOTAL#  DIAGNOSED  PIXELS 

44,000 
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7021 1 

Table  3  (a)  A  confusion  matrix,  under  leave-one-out  validation,  that  represents  trends  in 
misclassification  and  accurate  identification  of  benign  and  malignant  pathologies,  (b)  A  confusion 
matrix,  under  leave-one-out  validation,  that  represents  trends  in  misclassification  and  accurate 
identification  of  all  pathologies  identified  by  the  pathologist.  The  matrices  list  the  percentage  of 
pixels  classified  correctly  along  the  diagonal  (yellow),  and  incorrectly  off  the  diagonal.  Clear 
misclassifications  are  highlighted  in  gray. 


Figure  3  illustrates  classification  of  6  representative  tissue  samples.  The  first  column  contains  a 
digital  photograph  of  each  tissue  sample  taken  immediately  after  spectral  imaging;  this  is  the 
surgeon’s  perspective.  Fibro-glandular  tissue  is  white,  adipose  is  yellow-orange  and  higher 
concentrations  of  hemoglobin  are  red.  Histological  sections  were  co-registered  to  the  scattering 
and  white  light  images  and  are  displayed  in  column  two.  Hematoxylin  has  a  deep  blue-purple 
color  and  stains  nucleic  acids,  which  are  primarily  located  in  the  cell  nuclei.  Eosin  is  pink  and 
stains  proteins  nonspecifically;  mainly  the  cytoplasm  and  stroma  have  varying  degrees  of  pink 
staining.  Fat  is  not  preserved  during  histological  processing,  so  this  becomes  empty  space  on  the 
slide.  Column  3  illustrates  the  ROIs  identified  by  the  pathologist  and  they  are  colored  according  to 
their  true  diagnostic  category,  while  column  4  contains  images  of  the  automated  diagnosis 
provided  by  the  k-NN  classifier. 


Tissue  White  Tissue  ROIs  Identified 

light  Image  Histology  by  Pathologist 

:  ^ 


Classification 
(Not-malignant, 
Malignant,  Fat) 

m 


‘4 


: 


Fat 

Malignant 

Normal 

Masked 


Figure  3  Classification  of  6  representative  tissue  samples.  Each  row  corresponds  to  a  different 
tissue  sampel  and  the  following  four  images  are  co-registered  from  left  to  right:  (1)  a  white  light 
image  of  the  tissue,  (2)  H&E  stained  sections  of  the  tissue,  (3)  true  diagnosis  of  ROIs  identified  by 
the  pathologist,  and  (4)  classification  results  when  discriminating  between  not-malignant,  malignant 
and  adipose  tissues. 
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We  recognize  that  the  surgeon  is  most  interested  in  a  diagnosis  of  either  benign  or  malignant; 
therefore  the  k-NN  classifier  was  executed  with  this  binary  level  of  discrimination.  When 
separating  benign  and  malignant  pathologies,  the  sensitivity  and  specificity  was  91%  and  77% 
respectively;  and  the  system  achieved  positive  and  negative  predictive  values  of  88%  and  81% 
respectively. 

Discussion 

This  work  demonstrates  that  morphological  features  pertinent  to  a  tissue’s  diagnosis  may  be 
ascertained  from  confocal  detection  of  broadband  reflectance,  with  a  mesoscopic  resolution  that 
permits  scanning  of  an  entire  margin  for  residual  disease.  The  technical  aspects  and  optimization 
of  a  k-NN  classifier  for  automated  diagnosis  of  pathologies  is  presented  and  validated  in  29 
specimens  of  breast  tissue.  The  classifier’s  discriminating  capabilities  improved  with  the  inclusion 
of  local  statistics,  likely  accounting  for  microscopic  tissue  heterogeneities.  Initially,  discrimination 
between  all  pathologies  identified  by  WAW  was  attempted;  however,  inadequate  sampling  of 
uncommon  pathologies  rendered  their  classification  less  robust.  Given  the  sample  population, 
discrimination  between  not-malignant,  malignant  and  adipose  pathologies  was  most  intuitive; 
particularly  because  these  diagnostic  categories  correspond  to  the  three  macroscopic  scattering 
centers  in  breast  tissue  (stroma,  epithelium  and  adipocytes).  Negative  predict  values  of  87%, 
88%  and  99%  were  achieved  for  not-malignant,  malignant  and  adipose  tissues  respectively.  In 
the  same  order,  their  positive  predictive  values  were  86%,  81%  and  95%.  The  classifier  was  most 
sensitive  to  not-malignant  and  adipose  tissues  because  the  malignant  population  was 
pathologically  very  diverse;  including  samples  of  invasive  lobular  carcinoma,  invasive  ductal 
carcinoma  and  ductal  carcinoma  in  situ.  Specificity  was  lowest  in  not-malignant  samples  because 
of  their  mixed  fibro-glandular  and  adipose  content.  In  the  context  of  conservative  surgery,  the 
goal  of  treatment  is  to  maximize  removal  of  malignant  tissues  while  minimizing  damage  to 
healthy,  viable  tissue.  When  discriminating  between  benign  and  malignant  tissues  only,  a 
sensitivity  of  91%  and  a  specificity  of  77%  was  achieved.  This  sensitivity  is  significantly  higher 
than  those  reported  for  frozen  section  analysis  (which  is  not  practical  for  lumpectomy  margins 
because  of  the  problems  associated  with  freezing  and  cutting  adipose  tissues)  and  diffuse 
reflectance  spectroscopy;  although  its  specificity  is  lower 29-31 .  Even  though  overall  efficacy  of  the 
classifier  exceeds  or  is  comparable  to  other  intra-operative  assessment  techniques,  integration  of 
spectroscopy  methods  into  the  surgical  suite  will  require  a  better  negative  predictive  value, 
ensuring  that  the  surgeon’s  diagnosis  of  benign  or  malignant  is  an  accurate  one.  Additionally, 
higher  positive  predictive  values  would  prevent  unnecessary  tissue  removal  during  surgery. 

Improvement  of  the  classifier’s  performance  may  fundamentally  be  achieved  with  greater 
sampling.  As  the  number  of  data  points  in  feature  space  increases,  so  does  the  accuracy  of  the 
classifier  [Fukunaga,  1972  #2008].  Particularly,  the  classifier’s  sensitivity  to  malignant  pathologies 
has  the  most  to  gain  and  could  be  improved  with  equal  representation  of  IDC,  ILC  and  DCIS  in 
feature  space.  As  feature  space  expands,  so  will  computational  costs.  Rather  than  calculating  the 
distance  between  each  query  point  and  every  point  in  feature  space,  a  KD-tree  may  be  employed 
to  optimize  the  search  algorithm  32.  The  classifier  was  trained  with  ROIs  obviously  belonging  to  a 
diagnosis  -  normal  and  malignant  pathologies  were  identified  by  cellular  features  in  fibro- 
glandular  regions  and  adipose  ROIs  were  measured  far  from  any  fibro-glandular  tissue.  Perhaps 
the  performance  of  the  classifier  would  improve  if  an  additional  level  of  classification  was 
employed,  so  that  each  diagnostic  category  was  also  labeled  according  to  the  subtypes,  ‘fibro- 
glandular’  or  ‘fatty-fibro-glandular’. 

While  mechanically  scanning  the  sample  was  time  intensive  and  therefore  not  suitable  for  clinical 
translation,  a  second-generation  system  has  been  developed  that  employs  scanning-beam 
architecture  to  image  tissue  fields  up  to  1x1cm2  within  9-12  minutes.  The  high-throughput 
platform  combines  a  broadband  telecentric  scanning  design  with  dark-field  illumination/detection 
optical  path  to  allow  efficient  rejection  of  specular  light  while  maintaining  a  consistent  sampling 
geometry  across  the  entire  imaging  field.  System  details  can  be  found  in  the  reference33;  and  this 
system  is  now  used  to  image  breast  lumpectomy  specimens.  To  expand  upon  the  variety  of  data 
collected,  a  supercontinuum  white  light  source  is  used  with  the  new  scanning-beam  spectral 
imager.  This  source  allows  for  broadband  spectral  imaging  of  breast  pathology  in  a  waveband 
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(400-750nm)  that  is  sensitive  to  both  tissue  morphology  and  biochemical  composition. 
Particularly,  within  this  waveband  one  may  determine  the  concentration  of  oxygenated  and 
deoxygenated  hemoglobin,  beta  carotene  and  blood  break-down  products  in  a  tissue,  while 
simultaneously  extracting  scattering  features  in  a  region  with  minimal  absorption  (beyond 
620nm).  Addition  of  other  optical  parameters  to  the  classifier  is  extremely  simple,  only  involving 
an  expansion  of  feature  space  (update  the  vector  describing  each  pixel  to  N-dimensions,  where  N 
is  the  number  of  parameters).  It  would  be  particularly  useful  to  generate  a  comprehensive  dataset 
with  parameters  describing  all  possible  endogenous  light-tissue  interactions  (scattering, 
absorption,  fluorescence),  so  that  the  most  diagnostically  discriminating  and  robust  parameters 
could  be  identified  and  optimized  during  data  collection.  Note  that  beta-carotene  is  a  member  of 
the  carotenoids  and  gives  fat  its  highly  pigmented,  yellow  color;  we  hope  that  its  absorption 
spectra  will  improve  classification  of  tissues  with  high  adipose  content. 

Finally,  understanding  the  relationship  between  the  optical  and  biological  properties  of  a  tissue 
will  ultimately  improve  the  diagnostic  utility  of  optical  techniques  -  permitting  optimization  of  the 
measurement  procedure  and  signal  analysis  for  enhanced  sensitivity  to  differentiating  features. 
The  technique  remains  to  be  tested  intra-operatively;  future  clinical  studies  will  reveal  how  the 
system  may  enhance  existing  surgery  and  pathologic  procedures. 
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Key  Research  Accomplishments 

•  Validation  and  optimization  of  a  k-Nearest  Neighbor  classifier  to  automatically 
detect  breast  tissue  pathologies  based  upon  direct  sampling  of  the  scattering 
spectral  response. 

•  Assessment  of  classifier  performance  in  67,000  spectra  from  29  breast  tissue 
specimens.  When  discriminating  between  benign  and  malignant  pathologies,  a 
sensitivity  and  specificity  of  91%  and  77%  were  achieved  respectively. 

•  Detailed  sub-tissue  analysis  was  performed  to  consider  how  diverse  pathologies 
influence  scattering  response  and  overall  classification  efficacy. 

•  Development  of  a  dark-field  spectral  imaging  system  for  rapid  scanning  of  thick 
tissue  specimens  over  a  1cm2  field  of  view. 
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Conclusions 

In  this  contribution,  we  validate  and  optimize  the  ability  of  a  k-NN  classifier  to  automatically  detect 
breast  tissue  pathologies  based  upon  sampling  of  scattering  spectral  features.  The  sampling 
volume  was  specifically  chosen  to  be  sensitive  to  architectural  changes  addressed  by  a 
pathologist  during  microscopic  assessment  of  a  surgical  specimen,  while  also  permitting  its  wide 
field  scanning.  Performance  of  the  classifier  was  assessed  in  29  breast  tissue  specimens,  and 
when  discriminating  between  benign  and  malignant  pathologies,  a  sensitivity  and  specificity  of 
91%  and  77%  was  achieved.  Further,  detailed  sub-tissue  analysis  was  performed  to  consider 
how  diverse  pathologies  influence  scattering  response  and  overall  scattering  efficacy.  The 
purpose  of  automating  classification  of  scattering  response  from  diagnostic  pathologies  is  to 
assess  involved  surgical  margins  for  cancer  during  primary  surgery.  Identification  of  residual 
disease  at  the  time  of  primary  surgery  offers  clear  value  to  the  patient  by  decreasing  re-excision 
rates  and  improving  a  patient’s  survival  advantage.  If  residual  tumor  is  present  at  one  or  more 
margins,  the  surgeon,  before  closing,  could  be  advised  to  remove  more  tissue  immediately,  rather 
than  a  later  re-excision.  Optical  characterization  of  tissue  is  expected  to  improve  completeness 
of  resection  during  breast  conserving  surgery  because  molecular  interaction  with  light  provides 
specific  information  about  a  tissue’s  biochemistry  and  organelle  morphology,  which  are  altered  by 
disease. 
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